Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interaction:fixed effects and corrections modified #5928

Merged
merged 6 commits into from
Apr 29, 2024

Conversation

renu-pal
Copy link
Contributor

@renu-pal renu-pal commented Apr 2, 2024

FOR CONTRIBUTOR:
[ *] I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.

  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
    [ *] This PR updates an existing tool or tool collection
  • This PR does something else (explain below)

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @renu-pal.

One question inline. Please also bump the tool version.

@@ -55,7 +55,7 @@ cd outputFolder && mkdir -p figures/ && cp *.pdf figures
<inputs>
<param name="input_data" type="data" format="tabular" label="Data (or features) file"/>
<param name="input_metadata" type="data" format="tabular" label="Metadata file"/>
<param argument="--fixed_effects" type="select" multiple="true" optional="true" label="Interactions: Fixed effects" help="The fixed effects for the model, comma-delimited for multiple effects">
<param argument="--fixed_effects" type="text" multiple="true" optional="true" label="Interactions: Fixed effects" help="The fixed effects for the model, comma-delimited for multiple effects">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should this be a text paraeter? We should probably fix the help instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that now but if more association factors get added in the future, we might have to make changes again as we are doing now. So instead take it as a text parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, the fixed_effects as well as the random_effects should be of type: data_column (see:

<param name="cols" type="data_column" data_ref="input" multiple="true"
for an example).
Since for both multiple columns from the input can be chosen. The original optional values are nonsense, since they hard-coded the column names of the test data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am getting column numbers instead of names after using type="data_column" . Do we have any other type where we can get names of columns directly instead of numbers?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried setting numerical=false? https://docs.galaxyproject.org/en/latest/dev/schema.html#id51

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setting numerical="false" did not work instead setting use_header_names="true" worked and UI shows column names now but the test cases are failing. Do not understand why it is still accepting numerical values as valid options for tests. So working on that now.

tool_test_output.json

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but the use_header_names="true" option is really just cosmetic and for the UI, you still need to give numbers in the tests.

Very good idea to use use_header_names="true"!!!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @renu-pal it was a surprise for me, that there is not easy way to get the column_names available for the command line from Galaxy, however after some fideling I found a way to get them from the input file:

## get column names of fixed and random effect from the input file, since galaxy 
## can only return indices with type="data_column" 

## get header
#set $input = open(str($input_metadata), 'r')
#set $header = $input.readlines()[0].split('\t')

## get fixed effects
#set $fixed_effects_val = []
#for $i in $fixed_effects:
    #silent $fixed_effects_val.append($header[int($i)])
#end for
#set $fixed_effects = ','.join($fixed_effects_val)

## get random effects
#set $random_effects_val = []
#for $i in $random_effects:
    #silent $random_effects_val.append($header[int($i)])
#end for
#set $random_effects = ','.join($random_effects_val)

Maybe let the user know in the help, that the column names must be in the first line of the table or modify the code to skip comments when parsing the header

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forget about that, we need to get the column names in the command line not the cheetah code...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So update, this way, using awk the file is only parsed on script execution

## get column names of fixed and random effect from the input file, since galaxy 
## can only return indices with type="data_column" 
## using awk so that the file is only parsed on command line execution

#set idx = []
#for $i in $fixed_effects:
    #silent idx.append(f'${i}')
#end for
#set idx_for_awk = ','.join(idx)

fixed_effects=`awk -v OFS=',' -F"\t" 'NR == 1 { print $idx_for_awk}' '$input_metadata'` &&
echo 'Assigned fixed effects as:' \$fixed_effects &&

#set idx = []
#for $i in $random_effects:
    #silent idx.append(f'${i}')
#end for
#set idx_for_awk = ','.join(idx)

random_effects=`awk -v OFS=',' -F"\t" 'NR == 1 { print $idx_for_awk}' '$input_metadata'` &&
echo 'Assigned random effects as:' \$random_effects &&

tools/maaslin2/maaslin2.xml Outdated Show resolved Hide resolved
Copy link
Contributor

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @renu-pal ! Ones this is fixed, the tool is finally functional !
Ones they update the bioconda recipe: bioconda/bioconda-recipes#46915 we can also update the wrapper, but lets do this one first !

@@ -55,7 +55,7 @@ cd outputFolder && mkdir -p figures/ && cp *.pdf figures
<inputs>
<param name="input_data" type="data" format="tabular" label="Data (or features) file"/>
<param name="input_metadata" type="data" format="tabular" label="Metadata file"/>
<param argument="--fixed_effects" type="select" multiple="true" optional="true" label="Interactions: Fixed effects" help="The fixed effects for the model, comma-delimited for multiple effects">
<param argument="--fixed_effects" type="text" multiple="true" optional="true" label="Interactions: Fixed effects" help="The fixed effects for the model, comma-delimited for multiple effects">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, the fixed_effects as well as the random_effects should be of type: data_column (see:

<param name="cols" type="data_column" data_ref="input" multiple="true"
for an example).
Since for both multiple columns from the input can be chosen. The original optional values are nonsense, since they hard-coded the column names of the test data.

<option value="hochberg">hochberg</option>
<option value="hommel">hommel</option>
<option value="bonferroni">bonferroni</option>
<option value="BH">BH</option>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the full names for BH and BY as well, maybe also add a link to the explanation to the help and check if they all work, maybe add a test or two as well

@paulzierep
Copy link
Contributor

Thanks @renu-pal.

One question inline. Please also bump the tool version.

@renu-pal that means you need to increase the VERSION_SUFFIX by 1: https://github.com/renu-pal/tools-iuc/blob/ea393cd20a2651a554f59c8afa1620b5792951f8/tools/maaslin2/macros.xml#L4

The TOOL_VERSION can be increased to match the latest bioconde recipe ones its updated: https://forum.biobakery.org/t/update-maaslin2-bioconda-recipe/6830 and bioconda/bioconda-recipes#46915

Copy link
Contributor

@paulzierep paulzierep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update to work with no columns selected

@@ -1,6 +1,6 @@
<?xml version="1.0"?>
<macros>
<token name="@TOOL_VERSION@">0.99.12</token>
<token name="@TOOL_VERSION@">1.7.3</token>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atm we are still at 0.99.12, no ?

## can only return indices with type="data_column"
## using awk so that the file is only parsed on command line execution

#set idx = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renu-pal you said Yh it did a bit, but it's returning the complete list in case of 'random effect " when user does not choose any option
Could you try to fix that here, i.e. add an if statement, that does not choose any columns if no columns are specified by the user - and also add a test for it ...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, working on that

@renu-pal
Copy link
Contributor Author

warning : the correction methods other than BH and BY will not work due to older version of maaslin.

@bgruening
Copy link
Member

@renu-pal the newer version of https://anaconda.org/bioconda/maaslin2/files is online.

@renu-pal
Copy link
Contributor Author

Thanks for the update @bgruening ! I will bump the version to 1.16.0 but will need some time to push the updated version as test cases are failing because of the version update. So I need to look into that and make necessary changes.

@paulzierep
Copy link
Contributor

Since for the current version the wrapper is broken (not logically usable), would it make sense to merge this version first and then bump to the next version in a new PR ? @bgruening any preference ?

@@ -245,7 +278,7 @@ cd outputFolder && mkdir -p figures/ && cp *.pdf figures
<test expect_num_outputs="5">
<param name="input_data" value="HMP2_taxonomy.tsv"/>
<param name="input_metadata" value="HMP2_metadata.tsv"/>
<param name="fixed_effects" value="diagnosis,dysbiosisnonIBD"/>
<param name="fixed_effects" value="c4: diagnosis,c9: dysbiosisnonIBD"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mvdbeek does this look like a regression in Galaxy to you? I thought that either the column name or the c4 should work, but in combination, it looks wired to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The combination is not necessary. I mentioned header names in test cases to make it easier for us to recognize which values we are taking. I will change it to just numbers.

@renu-pal
Copy link
Contributor Author

I changed the values to just numbers. I was wondering if I should disable the other options other than BY and BH in correction methods so the user does not choose it until we do not update the version.

@bgruening
Copy link
Member

@renu-pal it was a good idea. I was just confused and learned now that this is also possible. If you like you can add a comment to say what those columns are. Overall this looks like its ready to go. I hope I fixed the failing test.

@renu-pal
Copy link
Contributor Author

Ready to go from my end too!!

@@ -347,6 +474,9 @@ Output
- It only includes associations with q-values <= to the threshold.
- Data frame with residuals for each feature (R data file)
- This file contains a data frame with residuals for each feature.

Correction methods to compute the q-value : https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/p.adjust
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is called for the correction, why can it only take BH / BY ? Or is this fixed now ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check this link.
https://forum.biobakery.org/t/invalid-p-value-correction-options/3321

It seems there was some issue from their end. So I believe if we update the version, it should work but I am not sure if they fixed the bug. Any way to check that ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK good to know, then the wrapper is fine as is, but you can remove this text from the help section (it will confuse the user since the option is not there); and put it back when we update the tool :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will update that

@renu-pal
Copy link
Contributor Author

I have updated the correction help. Please let me know if anything else needs to be updated.

@paulzierep
Copy link
Contributor

Looks good from my side, can we merge it @bgruening ?

@bgruening bgruening merged commit c682728 into galaxyproject:main Apr 29, 2024
11 checks passed
@bgruening
Copy link
Member

Thanks @renu-pal! Great work!

@renu-pal
Copy link
Contributor Author

Thanks for the guidance @paulzierep and @bgruening :)

@renu-pal renu-pal deleted the branch1 branch May 6, 2024 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants